Strings

In previous lectures we have seen strings being used numerous times. Today we are going to go into a bit more detail. First, some terminology:

  • 'single-quote character' refers to unicode character 34, --> { ' }
  • 'double-quote character' refers to unicode character 39, --> { " }
  • and if I say 'quote-character' then I refer to both/either of the above.

Okay, lets begin!

What is a string?

A string is basically a bunch of unicode characters, this makes them the ideal data type for storing written text. The Syntax:

{quote-character} unicode characters {MATCHING quote-character}

Examples:

  • "hello" # Valid syntax
  • 'hello' # also valid syntax
  • "hello' # doesn't work; uses both quotation characters.

And just as with numbers, we can also convert other data-types to strings using the str function.

Why do both single and double quote characters work?

The reason Python accepts the use of single OR double quote characters is to make it easier for dealing with text that actually contains quote-characters. Suppose for instance we wanted to store the following sentence as a string:

"Ahhh!!!! spiders!", cried the monster. "Do not worry" said our hero, "I have a sharp spoon".

wow, I'm hooked; with epic character development like that maybe I should be writing novels instead of programming tutorials?

Anyway, I digress. The point is if I try to save this sentence with double-quotes, problems occur. But I can save the string as is if wrap my string with single-quote characters. As demonstrated by the next two code snippets.


In [15]:
# wrapping text with double quotes...
cool_story_bro = ""Ahhh!!!! spiders!", cried the monster. "Do not worry" said our hero, "I have a sharp spoon"."
print(cool_story_bro)


  File "<ipython-input-15-3ef74b7e2f84>", line 2
    cool_story_bro = ""Ahhh!!!! spiders!", cried the monster. "Do not worry" said our hero, "I have a sharp spoon"."
                          ^
SyntaxError: invalid syntax

In [7]:
# wrapping text with single quotes...
cool_story_bro = '"Ahhh!!!! spiders!", cried the monster."Do not worry" said our hero, "I have a sharp spoon".'

print(cool_story_bro)


"Ahhh!!!! spiders!", cried the monster."Do not worry" said our hero, "I have a sharp spoon".

Because I messed up, you have homework.

When I first wrote the example it took me about 10 minutes to actually get it working, I just couldn't figure out what the problem was!

It turns out that in the original draft my spider-maiming hero said the phrase:

“don’t worry” 

The ' character in don't was messing up my attempt to enclose the whole string within single quotes. Here, let me show you:


In [8]:
cool_story_bro = '"Ahhh!!!! spiders!", cried the monster."Don't worry" said our hero, "I have a sharp spoon".'

print(cool_story_bro)


  File "<ipython-input-8-5319fcad857d>", line 1
    cool_story_bro = '"Ahhh!!!! spiders!", cried the monster."Don't worry" said our hero, "I have a sharp spoon".'
                                                                  ^
SyntaxError: invalid syntax

So what was my genius solution? Well obviously we cheat and change the text!

“don’t worry” --> “do not worry”

Problem...err….solved?

Anyway, Python does have ways of handling such inputs, your homework for this week to figure out how to make my intended string work – if it takes you less than 10 minutes then congratulations, you figured it out faster than I did. :)

The str function

Just like the int() and float() functions, the str function is a good way to convert one data-type to another. If I have the an integer and I want to store it as a string I can simply call the str() function, and Python will do the rest. The code snippet below will take any float/integer and return a string representation of that number.


In [9]:
def num_to_string(number):
    """takes a number of type float/int, returns string of that number"""
    return str(number)

# For an explanation of the next three lines of code, please see the 'calling functions' lecture. 
a = num_to_string(4555549099511) # large integer
b = num_to_string(-0.0044352334) # negative float
c = num_to_string(4.3e10) # scientific notation

print(a, type(a))
print(b, type(b))
print(c, type(c)) 

# and notice that we can use the float/int methods to convert the strings back to numbers just as easily...
print( float(c), type(float(c)) )


4555549099511 <class 'str'>
-0.0044352334 <class 'str'>
43000000000.0 <class 'str'>
43000000000.0 <class 'float'>

Why might you want to do this?

One reason you might want to store a number as a string is because if you convert a number to a string you have access to more 'methods' which may make some processes easier.

for example, lets suppose I want to find out what the first two digits of the number are. Converting a number to a string makes this process easy since strings are iterable and can be indexed into, whereas numbers cannot. Thats a lot of techinical jargon right now, but don't worry we shall cover indexing later.


In [4]:
def first_two_digits(number):
    n = str(number) # < -- convert number to string
    n = n[:2]       # < -- get the first two characters via slicing (more on slicing later).
    n = int(n)      # < -- converting n back to a number.
    
    return n

print(first_two_digits(100000))
print(first_two_digits(933323))
print(first_two_digits(11))


10
93
11

Escape Characters

Text frequently has ‘meta-data’ attached to it, by meta-data in this context I’m mainly talking about things like HTML tags; font colour, size, stylings (e.g bold, italic), and so on.

The normal process for handling this is to have the code embedded into the text itself. In other words, the text itself has characters that Python has parse as commands.

But for some applications you might want to have the ability to literally print every character passed in. So example, in the example directly below we have two lines of text, a pink heading and some text with tags. Crucially these two pieces of text are the same; the difference in what we see is the difference between literally printing the HTML tags versus executing them.

This is a heading

<h1 style="color:pink;">This is a heading</h1>

So, how does the computer know to interpret text in one way and not the other? Well, the solution is something called “escape characters”.

Just for completeness, to show you the tags to get pink text I had to use several HTML escape characters, I typed the following monstrocity:

&lt;h1 style=&quot;color:pink;&quot;&gt;This is a heading&lt;/h1&gt;

Thats a complex line of jargon I couldn't have done without the help of this tool. So yeah, escaping in HTML can be bit tricky but fortunately for us escape characters in Python are a bit easier to work with.

Consider the following lines of code.


In [5]:
a = "\\"
b = "\"


  File "<ipython-input-5-acf1ca423f4a>", line 2
    b = "\"
           ^
SyntaxError: EOL while scanning string literal

At first glance this code seems perfectly fine, right? The variable 'A' should be the string \\ right? And variable 'B' should just be a single backslash. But we don't get that, Python throws and error!

What’s going on here? Well, the reason is that the backslash character (\) is an escape character in Python. To actually get Python to literally print "\\" or "\" we would actually have to type out:


In [6]:
a = "\\\\"   # double \\
b = "\\"     # single \

print(a, b)
# Note that I didn't have to do any escaping in the comments, thats because Python just ignores comments!


\\ \

It is important to be aware of these Python features because If you don't know this stuff it you can be easily 'caught-out' the moment you start trying to parse complex strings. In what follows I have a (hopefully humorous) example of why you should care about this stuff. Let’s talk pathing.


In [18]:
directory = "C:\Documents\pictures\selfies"
print(directory)


C:\Documents\pictures\selfies

So let's imagine we are building some sort of code that saves a directory as a string for use later on. If we print this particular directory we get no surprises, it just works as we would expect.

But hold-up, what if I wanted to send my girlfriend a naughty photo! inside of my 'selfies' folder I have a 'nudes' folder. And inside the 'nudes' folder I have a plethora of Jpegs; my little sausage pictured from a variety of different angles wearing an assortment of novelty hats.

“Wait, did he just say little?”

On this occasion however, let's pretend I'm not a total weirdo (debatable), I want to sent her something arty, something classy.

[scurries through folder...]
[finds ... 'tasteful.jpeg' ]

Alright, lets code that up and see what happens...


In [19]:
directory2 = "C:\Documents\pictures\selfies\nudes\tasteful.jpeg" 
print(directory2)


C:\Documents\pictures\selfies
udes	asteful.jpeg

Oh dear! It seems like python doesn't want me to send dick-pics over the internet afterall! thats a pity, a big pity (wink wink).

What has gone wrong? Well, basically every time Python see's a backslash character it looks to see what the next character is. In the case of directory above, we have the following: \D, \p, \s, \n, \t

This first time we ran the code we didn't get any errors because \D \p where not special 'commands'. However, both \n and \t are special commands in Python. These commands get executed and we get a different result.

New line...

As an aside, \n is a very useful command to use within strings. It starts a new line, and splitting data up into separate lines frequently comes in useful.

"{some text}\n{more text}"

Simple example:


In [1]:
greeting = "hello\nworld"
print(greeting)

# using \t (which is tab)
greeting = "hello\tworld"
print(greeting)

# There are other commands of course, but I feel that most of them are not useful enough to be worth teaching.


hello
world
hello	world

In short, \n is a newline, and \t is tab. Thus, if we are trying to save/open files/folders on windows systems that start with t or n we can end up having some difficulties.

There are a few solutions to this problem. If you are dealing with directories specificially then the best choice is to you the os module. This module will fix a number of these issues for you (the os module works on linux and windows machines).

for example:


In [4]:
import os

directory = "C:\Documents\pictures\selfies"
photo_name = "santa_hat2.jpeg" 

## the bad way
path_to_photo_1 = directory + "\\" + photo_name

## the good way
path_to_photo_2 = os.path.join(directory, photo_name)

print(path_to_photo_1)
print(path_to_photo_2)


C:\Documents\pictures\selfies\santa_hat2.jpeg
C:\Documents\pictures\selfies\santa_hat2.jpeg

However, the above method only works for file systems, how can we solve this problem in a more general way?

Raw strings...

So what can we do if we want Python to ignore these commands? Well, the simplest solution is to put an 'r' before the string starts. The 'r' here tells Python we want a raw string.


In [5]:
string1 = r"\nevery\nword\nis\non\na\nnew\nline"  # notice the 'r' BEFORE the double-quote mark?
string2 =  "\nevery\nword\nis\non\na\nnew\nline"  # without the 'r', for comparision. 

print("The raw string version looks like this:\n", string1)
print("\n") 
print("The normal version of string looks like this:\n", string2)


The raw string version looks like this:
 \nevery\nword\nis\non\na\nnew\nline


The normal version of string looks like this:
 
every
word
is
on
a
new
line

A Few More Operations...

Strings are a huge topic in Python, and we are going to have come back to them latter. But for now, let me leave you with a few basic operations you can perform on strings...


In [12]:
# Repeating strings
     
#     {string} * {integer}

# Examples:

print("a" * 10)
print("abc" * 3)


aaaaaaaaaa
abcabcabc

In [13]:
# Concatenation 

#    {string} + {string}

# Examples:

print("ab" + "c")
print("a" + "b" + "c")


abc
abc

In [10]:
# Membership

#   {string} in {string}    

# Examples:

print("a" in "ab")
print("a" in "cb")
print("abc" in "aabbcc")  # must be an exact match.


True
False
False

HOMEWORK ASSIGNMENT

Name a variable "cool_story_bro" and then assign the the following text as a string:

"Ahhh!!!! spiders!", cried the monster."Don't worry" said our hero, "I have a sharp spoon".

Once complete, print it.


In [ ]:
# Your answer here…